A chromosome-level genome for the flower thrips Frankliniella intonsa

The flower thrips Frankliniella intonsa (Thysanoptera: Thripidae) is a common insect found in flowers of many plants. Sometimes, F. intonsa causes damage to crops through direct feeding and transmission of plant viruses. Here, we assembled a chromosomal level genome of F. intonsa using the Illumina, Oxford Nanopore (ONT), and Hi-C technologies. The assembled genome had a size of 209.09 Mb, with a contig N50 of 997 bp, scaffold N50 of 13.415 Mb, and BUSCO completeness of 92.5%. The assembled contigs were anchored on 15 chromosomes. A set of 14,109 protein-coding genes were annotated in the genome with a BUSCO completeness of 95.0%. The genome contained 491 non-coding RNA and 0.57% of interspersed repeats. This high-quality genome provides a valuable resource for understanding the ecology, genetics, and evolution of F. intonsa, as well as for controlling thrips pests.


Background & Summary
Thrips are small insects from the order Thysanoptera.Among the currently described thrips, only about 150 species are recognized as pests 1 .The flower thrips Frankliniella intonsa is a common species found in flowers of many plants.It is native to Eurasia, but now introduced to Oceania and North America [2][3][4][5][6] .Despite their small body size allowing for easy dispersal, the distribution of F. intonsa remains limited compared to a cosmopolitan pest from the same genus, the western flower thrips, Frankliniella occidentalis [7][8][9] .In its native range, F. intonsa was reported as a pest at times 10 but often found alongside other thrips in the field, leading to species competition and displacement [11][12][13][14][15] .However, in recent years, F. intonsa has been more frequently treated as a pest of crops 13,16 .In some regions, F. intonsa has developed resistance to insecticides used for its control 17,18 .In addition, F. intonsa has been found as a vector of plant virus from the genus Tospovirus [19][20][21] , although its transmission efficacy is lower than F. occidentalis 11 .Therefore, we need to understand its biology, ecology, and evolution, as well as its competition with other species, to reassess the pest status of F. intonsa and develop a proper control strategy 22,23 .Well-assembled genomes will provide genetic resources for the study of F. intonsa.Currently, genomes of thrips have been reported for the western flower thrips Frankliniella occidentalis 24 , tobacco thrips Frankliniella fusca 25 , melon thrips Thrips palmi 26 , bean flower thrips Megalurothrips usitatus 27,28 and rice thrips Stenchaetothrips biformis 29 .Recently, a parallel study of ours published a genome for F. intonsa that represents the first chromosome-scale genome for the species of the genus Frankliniella 30 .The specimens used for F. intonsa genome sequencing were collected from Zhejiang Province of southern China 30 .Here, we assembled another chromosome-level genome for F. intonsa, which was sequenced from specimens collected from Inner Mongolia of northern China, to enrich the genetic resources of this species.We utilized Illumina short-read sequences to estimate the genome features of F. intonsa.We also employed Oxford Nanopore Technologies (ONT) long-read sequences to assemble a contig-level genome.Furthermore, we utilized chromosome conformation capture (Hi-C) technology to assemble these contigs into a chromosome-level genome.

Methods
Sample collection and genomic DNA sequencing.A strain used for genome sequencing was reared for 10 generations in the laboratory at the College of Forestry, Inner Mongolia Agricultural University, Hohhot, China.About 100 unsexed adults collected from Huanghuagou Scenic Area in Chaha' er Right Wing Central Banner, Inner Mongolia, China (E 112°32′03″, N 41°08′17″) were used to establish the strain.Frankliniella intonsa was reared on the seedling of horsebean Vicia faba under the following laboratory conditions: 25 °C, 60% relative humidity and a 16 L:8D photoperiod.The specimens used for sequencing were morphologically identified to avoid the inclusion of other thrips species.About 1,000 adults with pooled male and female samples were utilized for the extraction of high-molecular-weight DNA (HMW DNA) and subsequent library construction.Genomic DNA was extracted from the entire body of pooled individuals using the Qiagen MagAttract HMW DNA Mini Kit, following the manufacturer's protocol.A short-read DNA library with an insert size of 500 bp was constructed using the Illumina TruSeq DNA PCR-Free HT LPK and sequenced on the Illumina X Ten platforms (Illumina Inc., San Diego, CA, USA).A long-read DNA library with an insert size of 23 kb was prepared according to the manufacturer's protocol and sequenced using the PromethION model of the ONT platform.The short reads were used for genome survey analysis, including estimating the genome size, and rates of heterozygosity and duplication, as well as for correcting the assembly from the long sequencing reads, while the long reads were used for the contig-level genome assembly.The sequencing process generated 15.55 Gb (73.88X coverage) of clean short-read data and 28.35 Gb (135.65Xcoverage) of long-read data, respectively (Table 1).

Hi-C library construction and sequencing.
The chromosome conformation of the genome was captured to determine the order and orientation of the contigs.Approximately 1,000 adults of mixed sex were used for constructing the Hi-C library.The specimens were ground and then cross-linked in a fresh, ice-cold nuclear isolation buffer with a 2% formaldehyde solution for 10 minutes at room temperature.The fixed cells were digested using DpnII (NEB) enzymes and processed according to the standard operating procedure for Hi-C library construction, which included cell lysis, incubation, labelling the DNA ends with biotin-14-dCTP, and performing blunt-end ligation of crosslinked fragments.The Hi-C library was amplified by 12-14 PCR cycles and sequenced on the Illumina NovaSeq.6000 platform.A total of 26.97 Gb of clean data were generated, representing 120.05X coverage of the genome (Table 1).

Genome characteristics estimation. Genome characteristics were estimated based on Illumina
short-reads.The raw sequences were trimmed using the software fastp 31 under the default parameters.KMC version 3.0 32 was used to count the K-mer distribution histogram under 17, 21, 27, 31 and 41-mer with parameters '-m96 -ci1 -cs10000' and '-cx10000' , based on the trimmed data.The genome size, heterozygosity rate, and duplication rate were estimated using GCE version 2.0 under the default parameters 33 .The estimated genome size decreased as the K-mer increased, ranging from 230 Mb to 255 Mb, similar to a previous study of this species 30 .The genome duplication decreased as the K-mer increased, with values ranging from 2.71% to 3.22%, higher than a previous study of this species (2.04%) 30 .Each K-mer distribution showed double-peaks, indicating a highly complex genome (Table 2, Fig. 1).
Genome assembly and annotation.The long-reads from ONT were quality-controlled and assembled into contigs using a "correct-then-assemble" strategy in nextDenovo version 2.5.2 34 with parameters 'read_cutoff = 1k, genome_size = 400 m, pa_correction = 20, sort_options = -m 100 g -t 10, minimap2_options_raw = -t 10, correction_options = -p 15, minimap2_options_cns = -t 10, nextgraph_options = -a 1' .These contigs were then polished three times based on the Illumina short reads using pilon version 1.22 35 under the default parameters.The polished contigs were further assembled into a chromosomal-level genome using Hi-C sequencing data.Low-quality reads and adapters from the Hi-C library were filtered using Trimmomatic version 0.39 36 under the default parameters and then mapped to the assembled contigs using Juicer 37 with default parameters.The reads were grouped into chromosomes using 3D de novo assembly (3D-DNA) version 180922 with parameters '-editor_ repeat_coverage = 15, -r 2' 38 .Mistakes were manually adjusted in Juicebox version 2.16.00 (https://github.com/aidenlab/Juicebox), and the raw-chromosomes were updated using the script "run-asm-pipeline-post-review.sh" in 3D-DNA again.At last, the repeat-masked high-quality genome assembly was submitted to the online tool Helixer 39 under the invertebrate mode for genome structure annotation.Functional annotation was performed by blasting the proteins against the Uniport/SwissProt database using blastp version 2.12.0+ 40 under the following parameters: '-evalue 0.000001 -outfmt 6 -num_threads 128 -num_alignments 1 -seg yes -soft_masking true -lcase_masking -max_hsps 1' .In total 422,839 contigs were assembled into 15 chromosomes (Fig. 2).The largest chromosome size was 21.406 Mb and the shortest was 10.106 Mb.We numbered the chromosomes in descending order of their size.The total length of the anchored genome was 209.09Mb with an N50 of 13.415 Mb.About 57 Mb contigs were not anchored to any chromosome.The anchored genome size is shorter than the estimated genome size and a previously assembled genome for this species 30 .Both anchored and unanchored contigs were submitted to GenBank with accession numbers CM069028.1-CM069042.1.In total, 14,109 protein-coding genes (PCGs) were identified with 9,931 genes have functional annotation 41 .The G + C content of the final genome assembly was 51.75% (Table 2).Repeat elements and non-coding RNA predictions.The repetitive elements longer than 1000 bp were identified against the Insecta repeats within RepBase Update (20120418).The identification was performed using RepeatMasker version open-4.0.0 42 (-no_is -norna -xsmall -q) with the search engine RM-BLAST (v2.2.23+).

Technical Validation
The extracted high molecular weight (HMW) DNA had an average size of approximately 23 Kb, as determined by pulsed-field gel electrophoresis.To assess the integrity and quality of the genome assembly and the set of protein-coding genes, Benchmarking Universal Single-Copy Orthologs (BUSCO) version 5.4.5 52 was used.For the chromosome-level genome assembly, the BUSCO completeness was 93.3%, 95.6%, 96.1% and 95.0%, based on the Eukaryota, Metazoa, Arthropoda and Insecta (odb_10, released on 2024-01-08) datasets, while the previously assembled genome has a completeness of 96.9%-98.8% 30.For the protein-coding gene set, the BUSCO completeness was 93.0%, 94.6%, 96.3% and 95.2% based on the Eukaryota, Metazoa, Arthropoda and Insecta datasets, respectively, while the previously assembled genome has a completeness of 89.5%-94.4% 30 .We mapped our Illumina short-read to the assembled genomes using BWA version 0.7.17-r1198-dirty 53 under the BWA-MEM algorithm.The mapping rate of short-reads data to our unmasked chromosomal-level genome and that of Zhang et al. 30 is 81.92% and 87.30%, respectively.We also mapped the Illumina short-read of Zhang et al. 30 and obtained a mapping rate of 84.04% for our genome assembly and 92.80% for the assembly of Zhang et al. 30 .

Fig. 1
Fig. 1 Estimated characteristics of Frankliniella intonsa genome based on Illumina short-read data.Results were obtained in GenomeScope version 2.0 with 17-(A), 21-(B), 27-(C), 31-(D) and 41-(E) mer.The K-mer distributions showed double peaks: the first peak indicates genome duplication and the highest peak represents a genome size peak.len, estimated genome size in bp; aa, homozygosity rate; ab, heterozygosity rate; dup, duplication rate.

Fig. 2
Fig. 2 Genome-wide contact matrix of Frankliniella intonsa generated using Hi-C data.Each blue square represents a chromosome, each green square represents a contig.Fifteen chromosomes were anchored under the default parameters of Juicer and 3D-DNA software.Numbers on the top and left axes show the chromosome length in Mb, numbers on the bottom axes show the chromosome number.Chromosomes are numbered based on their size, from the largest to the smallest.

Table 1 .
Sequencing data generated in this study for genome assembly of Frankliniella intonsa.

Table 2 .
Statistics for chromosomal-level assembly and annotation of Frankliniella intonsa genome.

Table 3 .
Repeated elements identified in the Frankliniella intonsa genome.

Table 4 .
Non-coding RNA identified in the Frankliniella intonsa genome.